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Abstract 

Development and optimization of novel species-specific microsatellites, or simple sequence repeats (SSRs) remains 
an important step for studies in ecology, evolution, and behavior. Numerous approaches exist for identifying new 
SSRs that vary widely in terms of both time and cost investments. A recent approach of using paired-end lllumina 
sequence data in conjunction with the bioinformatics pipeline, PAL FINDER, has the potential to substantially reduce 
the cost and labor investment while also improving efficiency. However, it does not appear that the approach has 
been widely adopted, perhaps due to concerns over its broad applicability across taxa. Therefore, to validate the 
utility of the approach we developed SSRs for 32 species representing 30 families, 25 orders, 1 1 classes, and six 
phyla and optimized SSRs for 13 of the species. Overall the IPE method worked extremely well and we identified 
1000s of SSRs for all species (mean = 128,485), with 17% of loci being potentially amplifiable loci, and 25% of these 
met our most stringent criteria designed to that avoid SSRs associated with repetitive elements. Approximately 61% 
of screened primers yielded strong amplification of a single locus. 
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Introduction 

Microsatellites, or simple sequence repeats (SSRs), are the 
genetic marker of choice for numerous applications in 
forensics, ecology, and evolution [1]. In particular their high 
variability and abundance across genomes make them ideal for 
studies of kinship, parentage, individual identification, 
population genetics, and linkage mapping (reviewed in [2]). In 
recent years, technological advances have brought other 
genetic markers into favor. For example, single nucleotide 
polymorphisms (SNPs) have gained favor for linkage studies 
[3], are increasingly being used in wildlife forensics [4], and 
with the development and improvement [5] of restriction-site 
associated DNA (RAD) tag sequencing approaches for SNP 
assays are likely to be increasingly used in population genetics 
studies (e.g., [6,7]). However, SSRs remain integral as is 
evidenced by examining a recent issue (vol 22 issue 4) of 
Molecular Ecology in which over 50% of the original articles 
relied on microsatellite analysis. In addition, new SSR loci are 
still being continually developed (e.g., 58 papers describing 



new SSR loci in Conservation Genetics Resources vol 4 no 4 
December 2012). 

Although SSR loci remain the genetic marker of choice, their 
development is still considered to be expensive and labor 
intensive. For many years, SSR development involved creating 
libraries enriched for repeat motifs, cloning the library, and 
using traditional Sanger sequencing to identify clones with 
inserts positive for SSRs. With the advent of next-generation 
sequencing technologies, methods for development and 
characterization of SSRs have improved dramatically. Most 
notably, researchers began using the Roche 454 sequencing 
platform to sequence SSR-enriched libraries [8]. Since then, 
our lab has used the enrichment and 454 sequencing methods 
in combination across a broad range of taxa including 
vertebrates [9-12], invertebrates [13-15], and plants [16,17]. 
While the two methods in tandem have worked well, the 
enrichment process is nonetheless time consuming, limits the 
search to selected motifs, can require high concentrations of 
DNA as starting material. In some species can result in 
inadvertent enrichment for transposable elements, which have 
similar motifs to SSRs [18]. It is possible to avoid inadvertent 
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enrichment by employing shotgun sequencing on the 454 
platform [19,20]; however, for species with large genomes or 
infrequent SSRs the cost can be prohibitive. Recently, a more 
cost effective and efficient method for SSR development using 
Illumina sequencing has been described [21]. Still, even with 
the technological advances of next-generation sequencing, the 
most common method for SSR detection still involves cloning 
and Sanger sequencing. In the SSR development papers in the 
issue of Conservation Genetics Resources mentioned above, 
the authors used Sanger sequencing in 52%, 454 sequencing 
(1/3 with enriched libraries) in 36%, and Illumina sequencing in 
only one article. 

In recent years, advances in Illumina sequencing have 
substantially increased the number of reads obtained. In 
addition, the cost of Illumina sequencing has decreased while 
the cost of 454 sequencing has remained stable. As a result, it 
is now cost efficient to use a shotgun sequencing approach 
with Illumina paired-end sequencing (IPE) 100 bp (HiSeq) or 
150 bp (GAIIx) to identify SSRs [21]. Castoe et al. [21] 
demonstrate that for one species, the Burmese python, 
shotgun sequencing via IPE and 454 yielded similar results and 
that IPE reads worked well for two species of birds, even 
though birds have relatively low frequency of SSR loci [22]. 
Though Castoe et al. thoroughly describe the SSR data from 
the IPE reads, they did not validate the primers designed for 
the three species. The method described by Castoe et al. is 
highly promising; however, there are two major concerns for 
the IPE method. First, that the short reads may not allow for 
sufficient flanking sequence to design primers. Second, that 
when primers are designed there is no estimate of amplicon 
length because the two sequences from the paired-end read 
may not overlap, and thus numerous loci may be either too 
short or long for classical fragment analysis. Given the 
apparent hesitancy of researchers to switch to next-generation 
sequencing for SSR development, we sought to assess and 
validate the IPE method for a variety of taxa. Our objectives 
include 1) comparing two different IPE shotgun library 
preparation protocols (one that requires 1 ug of DNA and one 
that only requires 10 ng), 2) using the IPE approach across a 
broad range of taxa to assess the number of reads returned 
positive for SSRs, the number of positive reads suitable for 
primer design, and the types of SSRs identified, and 3) to 
validate that primers designed via IPE will produce quality SSR 
loci for genotyping purposes. 

Methods 

Library preparation and sequencing 

Within a total of 32 species that comprise a wide taxonomic 
range (table 1), we used two different methods (16 species 
each) for creating Illumina paired-end shotgun libraries. The 
first entailed shearing 1 ug of genomic DNA using a Covaris 
S220, following the standard protocol of the Illumina TruSeq 
DNA Library Kit, and using the multiplex identifier adaptor 
indices. The second method followed the standard protocol of 
the Nextera™ DNA Sample Prep Kit from Epicentre® that uses 
only 10 ng of genomic DNA and incorporates lllumina- 
compatible bar codes. With both methods we pooled 4 - 8 



libraries and conducted Illumina sequencing on the HiSeq with 
100 bp paired-end reads. We demultiplexed the raw data using 
lllumina's standard GERALD pipeline. Following 
demultiplexing, we quality controlled reads for each species to 
remove bad reads. We wrote a Python QC script (available at 
https://qist.qithub.com/ionesken/6226417 ) to: remove "B-tail" 
bases (strings of bases with qualities less than Q15 at the end 
of a read, denoted by the B quality score in Phred-64 data), 
remove trimmed reads less than 50 bp, and reduce the files to 
5M QC-passed paired reads. The resulting reads were 
analyzed with the program PAL_FINDER_v0.02.03 [21] to 
extract those reads that contained perfect di-, tri-, tetra-, penta-, 
and hexanucleotide microsatellites and batch positive reads to 
a local installation of the program Primer3 (version 2.0.0) for 
primer design. 

Primer Screening 

For 12 of the 32 species, we tested forty-eight primer pairs 
for clean amplification and polymorphism across DNA obtained 
from eight individuals per species. We performed all PCR 
amplifications in a 12.5-pL volume (10 mM Tris pH 8.4, 50 mM 
KCI, 25.0 ug/ml BSA, 0.4 uM unlabeled primer, 0.04 uM tag- 
labeled primer, 0.36 pM universal dye-labeled primer, 3.0 mM 
MgCI 2 , 0.8 mM dNTPs, 0.5 units AmpliTaq Gold® Polymerase 
(Applied Biosystems), and 20 ng DNA template) using an 
Applied Biosystems GeneAmp 9700. For all loci, we used a 
touchdown thermal cycling program [23] encompassing a 10°C 
span of annealing temperatures ranging between 65-55°C. 
Touchdown cycling parameters consisted of an initial 
denaturation step of 5 min at 95°C followed by 20 cycles of 
95°C for 30 s, 65°C (decreased 0.5°C per cycle) for 30 s, and 
72 °C for 30 s; and 20 cycles of 95 °C for 30 s, 55°C for 30 s, 
and 72 °C for 30 s; and a final extension at 72°C for 5 m. We 
ran all PCR products on an ABI-3130xl sequencer and sized 
with Naurox size standard prepared as described in DeWoody 
et al. [24], except that unlabeled primers started with GTTT. 
We used GeneMapper version 3.7 (Applied Biosystems) to 
analyze alleles. 

Data Analysis 

We performed all statistical tests using general linear models 
(GLM; SAS version 9.2, SAS 2009). We first tested the effect of 
library prep METHOD on the numbers of SSRs and PALs 
identified; with no difference in prep method detected, we 
removed METHOD from subsequent models. We tested for 
taxonomic effects on numbers of SSRs, PALs, and Premium 
PALs (see below) identified at the kingdom, phylum, and class 
levels. We calculated the proportions of repeat types (hexa-, 
penta-, tetra-, tri-, and dinucleotides) out of all SSRs, the 
proportions out of all PALs, and the proportion of Premium 
PALs to PALs — proportion data were arcsin-squareroot 
transformed prior to analyses for taxonomic effects. 

Results and Discussion 

To determine the overall efficiency of the method, we 
sequenced IPE libraries for 32 species across a wide 
taxonomic range (table 1; NCBI BioProject PRJNA209850). 
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Table 1. Taxonomic information for the 32 species sequenced. 



Sample Number 


Kingdom 


Phylum 


Class 


Order 


Family 


Genus 


Species 


1 


Animalia 


Arthropoda 


Insecta 


Coleoptera 


Dytiscidae 


Stictotarsus 


aeguinoctialis 


2 


Animalia 


Arthropod a 


Insecta 


Hemiptera 


Plataspidae 


Megacopta 


Cribraria 


3 


Animalia 


Arthropoda 


Insecta 


Lepidoptera 


Nymphalidae 


Junonia 


coenia 


4 


Animalia 


Arthropoda 


Insecta 


Plecoptera 


Capniidae 


Mesocapnia 


arizonensis 


5 


Animalia 


Arthropoda 


Malacostraca 


Decapoda 


Lithodidae 


Paralithodes 


platypus 


6 


Animalia 


Arthropoda 


Malacostraca 


Decapoda 


Ocypodidae 


Uca 


mimax 


7 


Animalia 


Arthropoda 


Malacostraca 


Decapoda 


Ocypodidae 


Uca 


spinicarpa 


8 


Animalia 


Chordata 


Actinopterygii 


Cypriniformes 


Cyprinidae 


Rhinichthys 


osculus 


9 


Animalia 


Chordata 


Actinopterygii 


Salmoniformes 


Salmonidae 


Prosopium 


williamsoni 


10 


Animalia 


Chordata 


Amphibia 


Caudata 


Ambystomatidae 


Ambystoma 


talpoideum 


11 


Animalia 


Chordata 


Amphibia 


Caudata 


Pletodontidae 


Eurycea 


cirrigera 


12 


Animalia 


Chordata 


Aves 


Charadriiformes 


Alcidae 


Alca 


torda 


13 


Animalia 


Chordata 


Aves 


Charadriiformes 


Alcidae 


Ptychoramphus 


aleuticus 


14 


Animalia 


Chordata 


Aves 


Passeriformes 


Troglodytidae 


Campylorhynchus 


brunneicapillus 


15 


Animalia 


Chordata 


Aves 


Pelecaniformes 


Pelecanidae 


Pelecanus 


occidentalis 


16 


Animalia 


Chordata 


Aves 


Pelecaniformes 


Sulidae 


Sula 


bassanus 


17 


Animalia 


Chordata 


Aves 


Procellariiformes 


Hydrobatidae 


Oceanodroma 


castro 


18 


Animalia 


Chordata 


Mammalia 


Cetacea 


Delphinidae 


Tursiops 


truncatus 


19 


Animalia 


Chordata 


Mammalia 


Chiroptera 


Phyllostomatidae 


Ectophyla 


alba 


20 


Animalia 


Chordata 


Mammalia 


Didelphimorphia 


Didelphidae 


Tlacuatzin 


canescens 


21 


Animalia 


Chordata 


Mammalia 


Rodentia 


Cricetidae 


Onychomys 


leucogaster 


22 


Animalia 


Chordata 


Reptilia 


Squamata 


Colubridae 


Lampropeltis 


getula 


23 


Animalia 


Chordata 


Reptilia 


Squamata 


Phrynosomatidae 


Sceloporus 


grammicus 


24 


Animalia 


Chordata 


Reptilia 


Testudines 


Geoemydidae 


Batagur 


trivittata 


25 


Animalia 


Mollusca 


Bivalvia 


Unionoida 


Unionidae 


Leptodea 


Leptodon 


26 


Plantae 


Embryophyta 


Equisetopsida 


Asterales 


Campanulaceae 


Canarina 


n/a 


27 


Plantae 


Magnoliophyta 


Magnoliopsida 


Asterales 


Asteraceae 


Solidago 


gigantea 


28 


Plantae 


Magnoliophyta 


Magnoliopsida 


Caryophyllales 


Cactaceae 


Echinocereus 


n/a 


29 


Plantae 


Magnoliophyta 


Magnoliopsida 


Fabales 


Fabaceae 


Lupin us 


aridorum 


30 


Plantae 


Magnoliophyta 


Magnoliopsida 


Rosales 


Rosaceae 


Bencomia 


exstipulata 


31 


Plantae 


Magnoliophyta 


Magnoliopsida 


Scrophulariales 


Scrophulariaceae 


Mimulus 


ringens 


32 


Plantae 


Tracheophyta 


Coniferopsida 


Coniferales 


Cupressaceae 


Juniperus 


cedrus 



Sample number in bold indicates a Nextera library preparation method was used instead of the standard Illumina preparation, 
doi: 10.1371/journal.pone.0081853.t001 



Overall the IPE method worked extremely well and we 
identified 1000s of SSRs for all species (mean = 128,485) with 
the fewest (2,541) found in a bird species and the highest 
(644,886) in a crab (table 2). Due to the relatively short read 
length of the IPE method as compared with Sanger sequencing 
or 454, the ability to identify suitable primer sites was a 
concern. However, enough suitable flanking sequence was 
available for primer design in 17% of the reads with SSRs 
yielding on average 19,072 potentially amplifiable loci (PALs, 
sensu [21]). Though 17% is not a large value, given the vast 
amount of data produced, the process results in ample PALs. 
The library preparation method did not impact either the 
number of microsatellites (F=0.07, p = 0.79) or the number of 
PALs identified (F= 0.05, p = 0.8176). Though the Nextera 
method is more expensive it allows for using the IPE method 
even when only 10 ng of DNA is available. The ability to use 
very small quantities of DNA can be very important for species 
in which only non-invasive samples can be used or DNA is 
difficult to extract. 



We further filtered the PALs to identify those for which both 
the forward and reverse primer sequences were found only one 
time throughout the 5 million reads. These loci are deemed the 
loci with the best potential for clean amplification and are 
considered the Premium PALs (hereafter referred to as 
pPALs). One problem with older enrichment methods is the 
inadvertent selection of SSRs associated with transposable 
elements [18]. It is well described that for some taxa SSRs 
often occur in repetitive elements. When primers are designed 
for these SSRs, they often amplify multiple loci and accurately 
scoring such loci can be challenging or impossible. With 
PAL_FINDER_v0.02.03, it is possible to partially avoid these 
loci. By only working with loci that qualify as pPALs, it is less 
likely the primers will amplify multiple loci. Even using the 
stringent criteria for pPALs, we found over 100 loci for each 
species, over 500 for 27 species, and over 1000 for 19 species. 
Overall, -25% of all PALs qualify as pPALs. 

Given the range of species included, we examined for effects 
of taxonomy on SSR development. There was no effect of 
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Table 2. The number of paired end reads out of 5 million that contain microsatellites, and within those the number that 
contain suitable sequence for primers and are considered potentially amplifiable loci (PALs). 



Sample Number 


Genus 


Number of sequences with microsatellites 


Number of PALs 


6mers 


5mers 


4mers 


3mers 


2mers 


1 


Stictotarsus 


50,735 


2,576 


1,333 


3,413 


6,072 


3,946 


35,971 


2 


Megacopta 


86,717 


13,953 


28 


122 


2,408 


6,674 


77,485 


3 


Junonia 


62,927 


6,998 


250 


34,241 


1,790 


4,599 


6,747 


4 


Mesocapnia 


73,137 


13,090 


2,462 


1 1 ,669 


9,277 


14,391 


35,338 


5 


Paralithodes 


430,868 


54,838 


350 


194,790 


20,956 


51,573 


163,199 


6 


Uca 


644,886 


144,502 


70 


13,010 


42,400 


199,907 


389,499 


7 


Uca 


545,301 


94,805 


114 


13,360 


40,449 


88,638 


402,740 


8 


Rhinichthys 


238,812 


30,099 


2,796 


1,560 


106,375 


9,013 


119,069 


9 


Prosopium 


286,604 


26,109 


140 


257 


1,943 


3,374 


20,395 


10 


Ambystoma 


5,970 


1,582 


4 


70 


290 


554 


664 


11 


Eurycea 


27,272 


4,198 


1,572 


1,043 


16,853 


4,281 


3,523 


12 


Alca 


14,288 


2,136 


4,189 


2,054 


2,246 


1,995 


3,804 


13 


Ptychoramphus 


17,166 


3,093 


26 


274 


608 


1,444 


741 


14 


Campylorhynchus 


113,109 


4,760 


64,127 


28,928 


1 1 ,599 


5,837 


2,618 


15 


Pelecanus 


12,421 


2,554 


2,450 


3,459 


1,344 


3,032 


2,135 


16 


Sula 


82,003 


3,913 


4,275 


69,353 


1,684 


4,531 


2,160 


17 


Oceanodroma 


2,541 


418 


592 


390 


217 


646 


696 


18 


Tursiops 


34,387 


6,999 


2,150 


301 


4,110 


2,411 


25,415 


19 


Ectophyla 


25,278 


7,403 


2,774 


253 


4,344 


3,096 


14,811 


20 


Tlacuatzin 


94,285 


12,811 


3,865 


2,821 


36,927 


13,016 


37,656 


21 


Onychomys 


132,502 


33,500 


86 


316 


4,433 


3,817 


24,848 


22 


Lampropeltis 


244,857 


26,215 


302 


4,144 


8,975 


5,967 


6,827 


23 


Sceloporus 


139,529 


46,255 


4,320 


1,092 


21,778 


63,513 


48,827 


24 


Batagur 


22,319 


6,370 


19 


71 


486 


1,146 


4,648 


25 


Leptodea 


105,238 


8,601 


4,015 


606 


44,611 


13,035 


42,971 


26 


Canarina 


37,868 


7,242 


8 


12 


60 


1,440 


5,722 


27 


Solidago 


31,634 


7,607 


75 


405 


405 


4,555 


2,167 


28 


Echinocereus 


60,583 


6,964 


58 


539 


1,159 


2,597 


2,611 


29 


Lupinus 


391,973 


5,845 


105 


2,154 


426 


1,841 


1,319 


30 


Bencomia 


42,786 


14,777 


1,295 


723 


606 


14,632 


25,530 


31 


Mimulus 


32,170 


7,232 


400 


147 


484 


7,907 


23,232 


32 


Juniperus 


21,352 


2,853 


18 


36 


87 


1,375 


1,337 



Also included are the number of those SSRs that contained hexanucleotide, pentanucleotide, tetranucleotide, trinucleotide, or dinucieotide repeats. Sample number in bold 
indicates a Nextera library preparation method was used instead of the standard Illumina preparation, 
doi: 10.1371/journal.pone.0081853.t002 



kingdom or phylum on the number of SSRs, PALs, or pPALs 
found; however, class significantly affected all three categories 
(table 3). Across classes, the number of SSRs was lowest in 
the Amphibia and highest in Malacostraca. The number of 
PALs found was lowest in Aves and again highest in 
Malacostraca. However, for both measures there is ample 
variation across species within a class, as can be seen by the 
standard deviations (Figure 1a, b). The frequency of pPALs also 
ranged widely across taxa (mean = 5,607; range 136 - 52,682; 
table 4; Figure 1c). In working with PALs, the most important 
information is the proportion of PALs that are pPALs. Both 
phylum and class significantly affected this proportion (table 3), 
where the lowest proportion occurs in insects and the highest 
in mammals (Figure 1d). To further illustrate this point, we 
chose just one of the primer sequences (forward) and 
examined its copy number in the entire dataset. In some cases, 
the copy numbers of sequences is greater than 100,000 and 



frequently greater than 10,000 (Figure 2). In Eurycea, 
numerous primer sequences had copy numbers in excess of 
900,000. Across taxa, the distribution of copy numbers is quite 
different. In 3 of 4 mammalian taxa tested, the copy number of 
most PALs is one and rarely exceeded 10 (Figure 2a). Contrast 
this with insects and plants within the class Magnoliopsida that 
have relatively high PAL copy numbers (Figure 2b and 2c). The 
benefit of using the IPE method in conjunction with 
PAL FINDER vO.02.03 is the ability to identify and avoid these 
loci when desired. 

Interestingly, the types of SSRs found also varied across 
taxa. There was a significant effect of kingdom and phylum on 
the proportion of PALs and pPALs that were tetranucleotides, 
with fewer found in plants than animals (table 3). Class affected 
the proportion of most repeat types seen (table 3). As 
expected, dinucieotide repeats were overall the most common 
and accounted for > 50% of the SSRs for most species and 
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Table 3. Results of General Linear Model analysis examining role of taxonomy on the number of sequences that had 
microsatellites (No. msats), the number of PALs, the number of PALs that were different repeat types, the number of 
premium PALs (pPALs), the number of pPALs that were different repeat types, and the proportion of PALs that were pPALs. 





Kingdom (2) 


Phylum (7) 


Class (11) 


No. msats 


NS 


NS 


O.0001 


No. PALs 


NS 


NS 


O.0001 


6mers 


NS 


NS 


NS 


5mers 


NS 


NS 


NS 


4mers 


NS 


NS 


0.0491 


3mers 


NS 


NS 


0.0016 


2mers 


NS 


0.05 


O.0001 


Premium PALS 


NS 


NS 


0.0003 


6mers 


NS 


NS 


NS 


5mers 


NS 


NS 


NS 


4mers 


0.06 


NS 


0.0061 


3mers 


NS 


NS 


0.0032 


2mers 


NS 


NS 


0.0001 


pPALs/PALs 


NS 


0.0207 


O.0001 



doi: 10.1371/joumal.pone.0081853.t003 



classes (table 2). However when considering pPALs, Aves had 
relatively fewer dinucleotides and more hexa-, penta-, and tri- 
nucleotides than any other class. In amphibians, tetra-, tri-, and 
di-nucleotide repeats occurred at similar frequencies and had 
relatively more tetranucleotides than other classes. A vast 
majority of pPALs were dinucleotides in both fish species 
(83%) and the conifer (84%) species. However, due to the 
large number of SSRs identified, there are still numerous non- 
dinucleotide pPALs to work with (651 in Rhinichthys, 1379 in 
Prosopium, and 469 in Juniperus). 

For the 13 species for which we optimized primers, we had 
clean amplification of a single locus for 61% of the loci when 
using a single set of per conditions and cycling parameters 
(table 5). Success varied across major groups with -49%, 60%, 
and 67% amplifying in invertebrates, vertebrates, and plants 
respectively, with many other loci showing promise with 
additional optimization. One perceived problem with the IPE 
method is that once primers are designed the resulting 
amplicon size cannot be predicted. As we always designed 
primers in separate reads of the pair (i.e., forward primer in the 
forward read, and the reverse primer in the reverse read), and 
it was rarely the case that the paired ends overlapped, there 
was always uncertainty in how much sequence exists between 
the primers. Our methods only allowed us to visualize products 
under 550bp, thus it is possible that some primer pairs 
amplified larger fragments for which we could not detect. In 
some cases, the resulting product was too small for accurate 
sizing using our methods. This was a particular problem with 
the bivalve. However, we have ascertained that when the 
repetitive sequence was found in both of the paired reads the 
resulting amplicon is often very small, likely due to an overly 
short insert. After working with the bivalve, we began only 
ordering primers for loci in which the SSR was found in one 



direction only. This approach has eliminated short inserts, and 
subsequently short amplicons, as a serious problem. 
Alternatively, doing a strict size selection before sequencing 
could also remove these shorter loci. In general, for those 
species for which additional data on polymorphism and allelic 
diversity have been collected, a good spread of size ranges 
between 100 and 500bp have been observed [25-29]. The 
species that had the lowest success in yielding amplifiable loci 
was Stictotarsus. Interestingly, it also yielded a low proportion 
of pPALs, as well as very few tetranucleotide repeats, which in 
our experience amplify more cleanly. Developing robust SSR 
loci for Lepidopterans in general has been difficult, primarily 
due to the flanking sequences across loci being too similar 
([30] and references therein). Often only a few loci are 
generated per species (e.g., [31-34]). In our own experience 
with earlier methods, we screened 96 primer pairs to obtain five 
loci [35]. In the current study, we screened 48 primer pairs for 
Junonia coenia using only a single set of amplification 
conditions and identified 26 loci that produced strong peaks 
and did not appear to amplify multiple loci. 

Overall, our results demonstrate that Illumina paired-end 
sequencing identifies large numbers of SSR loci across a wide 
range of taxa. Additionally, using PAL_FINDER_v0.02.03 to 
analyze and refine the SSRs selection process, results in a 
high amplification success rate. In the current study we 
analyzed 5M reads per species, however, with sufficient 
resources much more data can be processed and we have 
now successfully analyzed up to 40M reads allowing for further 
refinement of PAL selection. 

Lastly, as both of our library preparation techniques yielded 
similar results, this IPE method is ideal even when only a very 
small amount of genomic DNA is available. 
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Figure 1. The mean and 95% upper confidence limit (values in parentheses are high values that go off the scale) for the 
number of SSR's (a), PALs (b), pPALs (c), and percent of PALs that were pPALs that were observed across classes. 

doi: 10.1371/journal.pone.0081853.g001 
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Table 4. Sample number and for each the number of pPALs found and the number that contained hexanucleotide, 
pentanucleotide, tetranucleotide, trinucleotide, or dinucleotide repeats. 



Sample Number 


pPALs 


6mers 


5mers 


4mers 


3mers 


2mers 


1 


201 


3 


0 


3 


71 


124 


2 


2,423 


0 


2 


12 


238 


2,171 


3 


136 


0 


1 


44 


53 


38 


4 


937 


2 


39 


68 


180 


648 


5 


19,407 


16 


51 


913 


3,213 


15,214 


6 


52,682 


2 


239 


2,368 


12,449 


37,624 


7 


24,022 


1 


179 


1,061 


5,879 


16,902 


8 


4,635 


3 


21 


188 


439 


3,984 


9 


6,671 


26 


32 


491 


830 


5,292 


10 


322 


1 


9 


62 


91 


159 


11 


1,118 


13 


54 


426 


411 


214 


12 


667 


11 


51 


165 


287 


148 


13 


1,016 


6 


83 


246 


419 


262 


14 


845 


29 


59 


149 


377 


231 


15 


626 


9 


55 


107 


317 


138 


16 


949 


20 


69 


119 


442 


299 


17 


165 


1 


11 


29 


69 


56 


18 


2,150 


2 


8 


261 


297 


1,582 


19 


3,178 


8 


29 


442 


454 


2,246 


20 


7,049 


30 


65 


1,062 


1,595 


4,297 


21 


17,797 


39 


120 


1,914 


1,695 


14,029 


22 


6,314 


48 


474 


1,948 


1,563 


2,281 


23 


14,511 


10 


107 


2,014 


6,509 


5,871 


24 


2,545 


8 


22 


169 


411 


1,935 


25 


1,163 


0 


3 


91 


285 


784 


26 


2,722 


2 


6 


15 


413 


2,286 


27 


813 


6 


38 


49 


466 


254 


28 


1,208 


9 


97 


94 


422 


586 


29 


803 


6 


145 


65 


382 


205 


30 


402 


8 


6 


10 


97 


281 


31 


791 


3 


2 


5 


195 


586 


32 


1,180 


3 


6 


39 


421 


711 



doi: 10.1371/joumal.pone.0081853.t004 
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Figure 2. Frequency histograms of forward primer sequence copy number within 5 million paired end reads. The 

proportion of all primers observed 1, 2-10, 11-100, 101-1000, 1001-10,000, 10,001 - 100,000 or > 100,000 times is shown for 
Mammallia (a), Insecta (b), and Magnoliopsida (c). 

doi: 10.1371/joumal.pone.0081853.g002 
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Table 5. Forty-eight primers were tested for amplification across 13 species. 



Amplification Result Species Sample Number 







1 


2 


3 


4 


5 


8 


9 


10 


11 


21 


24 


25 


31 


Number of loci with good amplification 




11 


24 


26 


25 


19 


23 


29 


11 


22 


29 


40 


11 


30 


Number of loci with good amplification, but were too small (e.c 


., <100bp) 


0 


3 


2 


0 


0 


1 


5 


6 


3 


4 


1 


24 


1 


Number of loci that would require further optimization 




14 


12 


10 


9 


11 


15 


3 


16 


13 


5 


5 


9 


8 


Number of loci that yielded zero amplification 




23 


9 


10 


14 


18 


9 


11 


15 


10 


10 


2 


4 


8 



doi: 10.1371/journal.pone.0081853.t005 
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